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Abstract. Given a set V of n coloured points on the real line, we study the problem of answering 
range a- majority (or "heavy hitter") queries on V ■ More specifically, for a query range Q, we want 
to return each colour that is assigned to more than an a-fraction of the points contained in Q. We 
present a new data structure for answering range a-majority queries on a dynamic set of points, where 
a 6 (0, 1). Our data structure uses 0(n) space, supports queries in 0((\gn)/a) time, and updates in 
0((lgn)/a) amortized time. If the coordinates of the points are integers, then the query time can be 
improved to 0(lg n/(a lg lg n)). For constant values of a, this improved query time matches an existing 
lower bound, for any data structure with polylogarithmic update time. We also generalize our data 
structure to handle sets of points in d-dimensions, for d > 2, as well as dynamic arrays, in which each 
entry is a colour. 

1 Introduction 

Many problems in computational geometry deal with point sets that have information encoded as colours 
assigned to the points. In this paper, we design dynamic data structures for the range a-majority problem, in 
which we want to report colours that appear frequently within an axis-aligned query rectangle. This problem 
is useful in database applications in which we would like to know typical attributes of the data points in a 
query range [23 24 . For the one-dimensional case, where the points represent time stamps, this problem has 
data mining applications for network traffic logs, similar to those of coloured range counting (cf. [I?]). 

Formally, we are given a set, V, of n points, where each point p G V is assigned a colour c from a set, 
C, of colours. We denote the colour of p as col(p) = c. We are also given a fixed parameter a G (0, 1), that 
defines the threshold for determining whether a colour is to be considered frequent. Our goal is to design a 
dynamic range a-majority data structure that can perform the following operations: 

— Query(Q): We are given an axis-aligned hyperrectangle Q as a query. Let V(Q) be the set {p | p G 
VCiQ}, and V(Q,c) be the set {p \ p E~P(Q), co\(p) = c}. The answer to the query Q is the set of colours 
C* such that for each colour c G C* , \V(Q,c)\ > a\V{Q)\, and for all c g C*, \V(Q,c)\ < a\V{Q)\. We 
refer to a colour c G C* as an a-majority for Q, and this type of query as an a-majority query. When 
a = 1/2, the problem is to identify the majority colour in Q, if such a colour exists. 

— Insert (p, c): Insert a point p with colour c into V. 

— Delete(p): Remove the point p from V . 



1.1 Previous Work 

Static and Dynamic Range a-Majority: In all of the following results, unless mentioned otherwise, the 
threshold a G (0, 1) is fixed at construction time, rather than specified for each query individually. 

* A preliminary version of this work appeared in the 22nd International Symposium on Algorithms and Computation 
(ISAAC 2011). This work was supported by NSERC of Canada, and NSERC PGS-D Scholarship, and the Canada 
Research Chairs Program. 



Karpinski and Nekrich [23 studied the problem of answering range a-majority queries, which they call 
coloured a- domination queries. In the static case, they gave an O(nfa) space data structure that sup- 
ports one-dimensional queries in 0((lgnlglgn)/a) timqj, and an 0((nlglgn)/a) space data structure that 
supports queries in 0((lgn)/a) time. In the dynamic case, they gave an 0(n/a) space data structure for one- 
dimensional queries that supports queries and insertions in 0((lg 2 n)/a) time, and deletions in 0((lg 2 n)/a) 
amortized time. They also gave an alternative 0((n\gn)/a) space data structure that supports queries and 
insertions in 0((lgn)/a) time, and deletions in C*((lgn)/a) amortized time. For points in d-dimensions, for 
constant d > 2, they gave a static 0((n lg d_1 n) /a) space data structure that supports queries in 0((lg d n) / a) 
time, as well as a dynamic 0((n\g d ~ 1 n)/a) space data structure that supports queries and insertions in 
0((\g d+1 n)/a) time, and deletions in 0((lg d+1 n)/a) amortized time. 

Durocher et al. [12] described a static 0(n(lg(l/a) + 1)) space data structure that answers range a- 
majority queries in an array in 0(1/ a) time. This data structure is based on the idea that it is possible to 
produce a short list of candidate a-majorities for any query, and then efficiently verify the frequencies of 
these candidates using succinct data structures. In a later version of the same paper [131 . they described 
how to extend their technique to (^-dimensions for constant d > 2, resulting in an 0(nlg d_1 n) space data 
structure that supports range a-majority queries in 0(lg d n/a) time. Gagie et al. [16] improved the static one- 
dimensional result to 0(n(min(lg(l/a), H) + 1)) space, where H < lgn is the Oth-order empirical entropy of 
the sequence stored in the array. The same authors also described how to improve the query time to 0(1/(3), 
when asked for the /3-majorities in a query range, for any f3 > a specified at query time. Recently, for the two- 
dimensional static case, Wilkinson [21] presented an improved data structure that occupies 0(n lg e nlg(l/a)) 
space, for any constant e > 0, and can answer queries in 0(\gn/a) time. 

Approximate Versions of the Problem: Researchers have also examined an approximate version of the range 
a-majority problem, in which the solution must contain all the a-majorities in a query range, but can also 
contain some false positives. Lai et al. [53] studied the dynamic problem, using the term heavy- colours instead 
of a-majorities. They presented a dynamic data structure based on sketching, which provides an approximate 
solution with probabilistic guarantees for constant values of a. For one dimension their data structure uses 
0(hn) space and supports queries and updates in 0(h\gn) time, where the parameter h — 0( 1& ^ lg( lg J^ )) 
depends on the threshold a, the approximation factor e, the total number of colours \C\, and the probability 
of failure S. They also noted that the space can be reduced to O(n), at the cost of increasing the query time 
to 0(h\gn + h 2 ). Thus, for constant values of e, S, and a, their data structure uses 0(n) space and has 
0((lg n lg lg n) 2 ) query and update time in the worst case when lgm = f2 (lgn). 

Another approximate data structure based on sketching was proposed by Wei and Yi [30]. Their data 
structure uses linear space, answers queries in 0(lgn + 1/e) time, and may return false positives with 
relative frequency between a — e and a. The cost of updates is 0(p lgnlg(l/e)) amortized time, where \x 
is the cost of updating the sketches. We note that this result was obtained independently of ours, and that 
both our techniques and the main technique they develop, called exponential decomposability, are similar. 
By combining Theorem 4 of their paper with standard range counting data structures, it is not difficult to 
get a data structure that occupies linear space, answers queries in 0(lgn/a) time, and supports updates in 
0((lgnlg(l/a))/a) amortized time for the non-approximate version of the problem that we study. However, 
we slightly improve this update time, and also generalize our data structures to higher dimensions, whereas 
their structure is part of a more general framework that supports other kinds of aggregate queries. 

Lower Bounds: The partial sum problem for threshold functions [21] captures the essence of the dynamic 
range a-majority problem: maintain n bits x\,...,x n subject to updates and threshold queries. An update 
consists of flipping the bit at a specified index. The answer to query threshold(i) is "yes" if and only if 
Y?j=i x j — where f(i) is an integer function such that f(i) £ {0, [i/2]}. Husfeldt and Rauhe proved 
a lower bound [21] on the query time t q for a data structure that can answer threshold queries with update 
time t u . 

4 We use lg n to denote [log 2 n\ . 
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Any data structure for dynamic a-majority can be used to solve the partial sum problem for threshold 
functions. In particular, we can treat the problem as involving n points with integer coordinates 1, ...,n, 
with each point having one of two colours. A flip operation can be implemented as a deletion followed by 
an insertion. Thus, we can state their lower bound in terms of our problem, denoting the cell size of our 
machine as w: 

Lemma 1 (Follows from [21j, Prop. 4). Let t u and t q denote the update and query times, respectively, 
for any dynamic a-majority data structure. Then, 

^ f lg(min{an, (1 — a)n}) 
q \ \g(t u w lg(min{an, (1 - a)n})) 

This bound suggests that, for constant values of a and word size 6*(lgn) bits, 0(lgn/lglgn) query time 
for integer point sets is optimal for any data structure with polylogarithmic update time. 

Other Related Work: Finally, several other results exist for finding a-majorities in the streaming model, 
typically referred to as heavy hitters [6[10 22 25 . Dc Berg and Haver kort [9] studied a similar problem of 
reporting r- significant colours. For this problem, the goal is to output all colours c such that at least a 
r-fraction of all of the points with colour c lie in the axis-aligned query rectangle. More broadly, there 
are other data structure problems that deal with coloured points. In coloured range reporting problems, we 
are interested in reporting the set of distinct colours assigned to the points contained in an axis-aligned 
rectangle. Similarly, in the coloured range counting problem we are interested in returning the number of 
such distinct colours. Gupta et al. [20], Bozanis et al. [TJ, and, more recently, Gagie et al. [TS] and Gagie and 
Karkkalnen [T7J studied these problems and presented several interesting results. 

1.2 Our Results 

In this paper we present new data structures for the dynamic range a-majority problem in the word-RAM 
model with word size i?(lgn), where n is the number of points in the set V , and a € (0, 1). Our results 
are summarized and compared to the previous best results in Table [TJ The input column indicates the type 
of data we are considering. We use points to denote a set of points on a line with real-valued coordinates 
that we can compare in constant time, integers to denote a set of points on a line with word sized integer 
coordinates, and array to denote that the input is to be considered a dynamic array, where the positions of 
the points are in rank space. 
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Table 1. Comparison of the results in this paper to the previous best results. For the entries marked with "*" the 
running times are amortized. 



Our results improve upon previous results in several ways. Most noticeably, all our data structures require 
linear space. In order to provide fast query and update times for our linear space structures, we prove several 
interesting properties of a-majority colours. We note that the lower bound from Lemma Q] implies that, for 
constant values of a, an 0(lgrt/lglgrt) query time for integer point sets is optimal for any data structure 
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with poly logarithmic update time, when the word size w = 0(\gn). Our data structure for points on a line 
with integer coordinates achieves this optimal query time. 

Our data structures can also be generalized to handle d-dimensional points, improving upon previous 
results in the dynamic case [53]. For d > 2, our data structure occupies 0(nlg d ~ 1 n) space, answers range 
a-majority queries in 0((lg d n)/a) time, and supports updates in 0((lg d n)/a) amortized time. 

Road Map: In Section [5] we present a dynamic range a-majority data structure for points in one dimension. 
In Section [3] we show how to speed up the query time of our data structure in the case where the points have 
integer coordinates. In Section U we generalize our one dimensional data structures to higher dimensions. 
Finally, in Section [5] we present our data structure for dynamic arrays. 

Assumptions About Colours: In the following sections, we assume that we can compare colours in constant 
time. In order to support a dynamic set of colours, we employ the techniques described by Gupta et al. |20) . 
These techniques allow us to maintain a mapping from the set of colours to integers in the range [l,2n], 
where n is the number of points currently in our data structure. This allows us to index into an array using 
a colour in constant time. 

For the dynamic problems discussed, this mapping is maintained using a method similar to global re- 
building to ensure that the integer identifiers of the colours do not grow too large [5D1 Section 2.3]. When 
a coloured point is inserted, we must first determine whether we have already assigned an integer to that 
colour. By storing the set of known colours in a balanced binary search tree, this can be checked in 0(lg |C|) 
time; recall that \C\ is the number of distinct colours currently assigned to points in our data structure. 
Since \C\ < n, this cost is absorbed by update time of our data structure; see Table [TJ Therefore, from this 
point on, we assume that we are dealing with integers in the range [1, 2n] when we discuss colours. 

2 Dynamic Data Structures in One Dimension 

In one-dimension we can interchange the notion of points and cc-coordinates in P, since they are equivalent. 
Depending on the context we may use either term. Our basic data structure, like that of Karpinski and 
Nekrich [33], is a modified weight balanced B-tree [3J. However, we prove several interesting combinatorial 
properties of a-majorities in order to provide more efficient support for queries and updates. 

Let T be a weight-balanced B-tree with branching parameter 8 and leaf parameter 1 such that each leaf 
represents an x-coordinate in V . From left to right the leaves are sorted in ascending order of the cc-coordinate 
that they represent. Let T(u) be the subtree rooted at node u. Each internal node u in the tree represents a 
range R(u) = [x m in, Xmax], where x m m is the ^-coordinate represented by the leftmost leaf in T(u), and x max 
is the x-coordinate represented by the rightmost leaf in T(u). We number the levels of the tree 0, 1, 6>(lg n) 
from top to bottom. If a node is h levels above the leaf level, we say that this node is of height h. By the 
properties of weight-balanced B-trees, the range represented by an internal node of height h contains at least 
8 h /2 (except the root) and at most 2 x 8 h points, and the degree of each internal node is at least 2 and at 
most 32. 

2.1 Supporting Queries 

Given a query Q' — [x' a ,x' b ], we perform a top-down traversal on T to map Q' to the range Q — [x a ,Xb], 
where x a and Xb are the points in V with x-coordinates that are the successor and the predecessor of x' a and 
x' b , respectively. We call the query range Q general if Q is not represented by a single node of T. We first 
define the notion of representing a general query range by a set of nodes: 

Definition 1. Given a general query range Q — [x a , x\,], Q induces a set, I, of nodes in the tree T, satisfying 
the following two conditions. 

1. The range represented by the parent of each node in I is not entirely contained in Q. 

2. For all p £ V fl Q, there exists some node u £ I with p £ R(u). 
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We say that I is the set of nodes in the tree T representing Q. 

For each node u £ T, we keep a list, L(u), of k candidate colours, i.e., the k most frequent colours in the 
range R(u) represented by u, breaking ties arbitrarily. Later, we will hx a value for k. Let L* — U ue /L(u), 
i.e., the union of all the candidate lists among the nodes representing the query range Q. For each colour 
c G C, we keep a separate range counting data structure, F c , containing all points p € V with colour c, and 
also a range counting data structure, F, containing all of the points in V. Let m be the total number of 
points in the range [x a , Xb], which can be determined by querying F. For each c € L* , we query F c with the 
range [ir a ,Xb] letting occ be the result. If occ > am, then we report that c £ C* . 

It is clear that / contains at most 0(\gn) nodes. Furthermore, if a colour c is an a-majority for Q, then 
it must be an a-majority for at least one of the ranges in / [331 Observation 1]. If we set k = \l/a] and 
store |~l/a] colours in each internal node as candidate colours, then, by the procedure just described, we 
will perform a range counting query on 0((\gn)/a) colours. If we use balanced search trees for our range 
counting data structures, then this takes (9((lg 2 n)/a) time overall. However, in the sequel we show how to 
improve this query time by exploiting the fact that the nodes in / that are closer to the root of T contain 
more points in the ranges that they represent. 

We shall prove useful properties of a general query range Q and the set, /, of nodes representing it in 
Lemmas [2 |3j |4l and [5j In these lemmas, m denotes the number of points in Q, and denote the 

distinct values of the heights of the nodes in /, where i\ > i^ > ■■■ > 0. We first give an upper bound on the 
number of points contained in the ranges represented by the nodes of I of a given height: 

Lemma 2. The total number of points in the ranges represented by all the nodes in I of height ij is less 
than m x min(l,31 x 8 1_J ). 

Proof. Since Q is general and contains at least one node of height i\, m is greater than the minimum number 
of points that can be contained in a node of height i%, which is 8 n /2. The nodes of / whose height is ij, j ^ 1, 
are siblings and must have at least one sibling that is not in /. The number of points contained in the interval 
represented by this sibling is at least 8 lj /2. Therefore, the number, mj, of points in the ranges represented by 
the nodes of / at level ij is at most 2 x 8^ +1 -8'^ /2 = (31/2) x 8^. Thus, m-j/m < 31 x 8^~ 41 < 31 X8 1 "- 7 '. □ 

We next use the above lemma to bound the number of points whose colours are not among the candidate 
colours stored in the corresponding nodes in /. 

(c) 

Lemma 3. Suppose we are given a node v £ / of height ij and a colour c. Let n v denote the number of 
points with colour c in R(v), the range covered by v, if c is not among the first kj — [fc/2- 7-1 ] most frequent 
candidate colours in the candidacy list of v, and ni — otherwise. Then Y] ve j rvj < 5 fc 5 ^™ . 

Proof. If c is not among the first kj candidate colours stored in v, then the number of points with colour c 
in R(v) is at most l/(kj + 1) times the number of points in R(v). Thus, 



V„(c) c V m v (31X8 1 i)m 
^ v ^k j + l ^ kj + l 

v£l j = l 3 j>3 3 

m ( /2 2 2 3 

k + 1 V V 8 8 

5.59m 
< k + l 



□ 



We next consider the nodes in I that are closer to the leaf levels. Let It denote the nodes in / that are 
at one of the top t = |" lg g«^ + 2.05] — not necessarily consecutive — levels of the nodes in /. We prove the 
following property: 
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Lemma 4. The number of points contained in the ranges represented by the nodes in I \ It is less than 
am/2. 

Proof. By Lemma [51 the number of points contained in the ranges represented by the nodes in / \ I t is less 
than: 

31m £ 8^ < 31m (I + -L + 

j>t+i v 7 

< 31m (| x 1) 

Since £ > lg g° - + 2.05, the above value is less than am/2. □ 

With the above lemmas, we can choose an appropriate value for k to guarantee the following property 
that is critical to achieve improved query time: 

Lemma 5. When k = [ ] — 1, any a-majority colour, c, of the query range Q is among the union of 
the first |"fc/2 3_1 ] candidates stored in each node of height ij representing a range in I t . 

Proof. The total number of points with colour c in the ranges represented by the nodes in I \ I t is less than 
am/2 by Lemma [3J By Lemma and our choice for the value of k, less than am/2 points in the ranges 
represented by the nodes in I t for which c is not a candidate can have colour c. The lemma thus follows. □ 

For each node n£T, we keep a semi-ordered list of the k candidate colours in the range R(v) represented 
by v. The order on the colours for any candidacy list is maintained such that the most frequent \k/2 1 ~\ 
colours come first, for all j = 2,3,..., arbitrarily ordered within their positions. Note that such a semi- 
ordering can be obtained in O(k) time by repeated median queries. That is, by using a linear time median 
finding algorithm [5], we can partition the list so that the first half of the list contains the k/2 most frequent 
colours, and then recurse on the first half of the list until the list has 1 element. In total, this takes 0(k + 
k/2 + ...k/4) =0(k) time. 

By setting k = [11.18/a] — 1, Lemma [5] implies that the colours that we have checked are the only 
possible a-majority colours for the query. Furthermore, Lemma 2] implies that we need only check the nodes 
on the top 0(lg(l/a)) levels in /. Let It denote the set of nodes in these levels. We present the following 
lemma: 

Lemma 6. The data structures described in this section occupy 0(n) words, and can be used to answer a 
range a-majority query in 0((lgn)/a) time with the help of an additional array of size 2n. 

Proof. To support a-majority queries, we only consider the nontrivial case in which the query range Q is 
general. By Lemma[5l the a- majorities can be found by examining the first [~fc/2 J ] candidate colours stored 
in each node representing a range in I t . Thus, there are at most 0([^] + \^\ + \ + ■•• + [~ 2 t-i a ] ) = O(-) 
relevant colours to check. Let L t denote the set of these colours. For each c € L t we query our range counting 
data structures F c and F in 0(lgn) time to determine whether c is an a-majority. Thus, the overall query 
time is 0((\gn)/a). 

There are 0{n) nodes in the weight-balanced B-tree. Therefore, one would expect the space to be 0(n/a) 
words, since each node stores 0(1 /a) colours. We use a pruning technique on the lower levels of the tree in 
order to reduce the space to 0(n) words overall. If a node u covers less than 1/a points, then we need not 
store L(u), since every colour in T(u) is an a-majority for R(u). Instead, during a query, we can traverse 
the leaves of T(u) in order to determine the unique colours. To make this efficient, we require an array D of 
size 2n integers to count the frequencies of the colours in R(u). As mentioned in Section 11.21 we can map a 
colour into an index of the array D, which allows us to increment a frequency counter in 0(1) time. Thus, we 
can extract the unique colours in R(u) in 0(\T(u)\) — O(-) time. The number of tree nodes whose subtrees 
have at least 1/a leaves is 0(na). Thus, we store 0(k) = 0(1/ a) words in 0(na) nodes, and the total space 
used by our B-tree T is 0(n) words. The only other data structures we make use of are the array D and the 
range counting data structures F and F c for each c E C, and together these occupy 0(n) words. □ 
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2.2 Supporting Updates 



We next establish how much time is required to maintain the list L(v) in node v under insertions and 
deletions. We begin by observing that it is not possible to lazily maintain the list of the top k — [ ] — 1 
most frequent colours in each range: many of these colours could have low frequencies, and the list L(v) 
would have to be rebuilt after very few insertions or deletions. To circumvent this problem, we relax our 
requirements on what is stored in L(v), only guaranteeing that all of the /3-majorities of the range R(v) must 
be present in L(v), where j3 = l" 11 ^ 18 ]" 1 . With this alteration, we can still make use of the lemmas from the 
previous section, since they depend only on the fact that there are no colours c ^ L(v) with frequency greater 
than /3|T(w)|. The issue now is how to maintain the /3-majorities of R(v) during insertions and deletions of 
colours. 

Karpinski and Nekrich noted that if we store the (/3/2)-majorities for each node v in T, then it is only 
after \T(v)\(3/2 deletions that we must rebuild L(v) [23]. For the case of insertions and deletions, their data 
structure performs a range counting query at each node v along the path from the root of T to the leaf 
representing the inserted or deleted colour c. This counting query is used to determine if the colour c should 
be added to, or removed from, the list L(v). 

In contrast, our strategy is to be lazy during insertions and deletions, waiting as long as possible before 
recomputing L(v), and to avoid performing range counting queries for each node in the update path. We 
provide a tighter analysis (to constant factors) of how many insertions and deletions can occur before the 
list L(v) is to be rebuilt. One caveat is that the results in this section only apply when (3 £ (0, However, 
since a < 1, our choice of (3 satisfies this condition. 

We use Z* to denote Z + U {0}. The following lemma is used to show a lower bound on the number of 
update operations (insertions and deletions) that can occur before a list needs to be recomputed: 

Lemma 7. Let r(£,j,/3) = min ni£ z» ,n d gz* yn>i +rid £+n 7 ~„ d > ^} wnere & € 3 <= 3 < an d 
£e(0,|]. Ift>2j + 1, then r(£J,f3) > 

Proof. Observe that > j^j if and only if £ > 2j + l. This implies that increasing rii rather than rid by the 
same amount increases the value of the ratio e +^ r ^ nd by a greater amount when I + rii — > 2(j + n{) + 1. 
Thus, we have 

r(t,j,l3)= mm !m | >(3 

if £ + i > 2(j + i) + 1 for 1 < i < r(£,j,/3). Also observe that > /3 implies m > 4^j- All that remains is 
to show that for S (0, j] the constraint £ + i > 2(j +i) + 1 is satisfied for 1 < i < r(£, j, ft). To show this, 
we observe that if i < £ — 2j, then £ + i > 2(j + i) + 1. Thus, the constraint is satisfied if r(£,j, f3) < £ — 2j. 
Since < £ ~ 2 j for all /3 G (0, |], we get the desired bound. □ 

We can think of the variables rii and rid as the numbers of insertions and deletions into our data structure. 
Thus, r(£, j, /3) represents the number of updates that can occur in a range containing £ points before a colour 
c with j occurrences can possibly become a /3-majority. We next prove the following lemma: 

Lemma 8. Suppose the list L(v) for node v contains the \ most frequent colours in the range 

R(v), breaking ties arbitrarily. For (3 6 (0, |], this value is upper bounded by [~-|] . Let £ be the number of 
points contained in R(v). Only after \ -^=p^-^=^- '\ > \^\ insertions or deletions into T(v) can a colour 
c ^ L(v) possibly become a /3-majority for the range spanned by node v. 

Proof. Since we keep in L(v) the k most frequently appearing colours in the range i?(w), any colour not in L{v) 
can appear at most times. We apply LemmaJT) noting that it exactly describes the number of insertions or 
deletions required to cause a colour with frequency m to become a /3-majority in a range containing £ points. 
Thus, we get that r(£, j^j,(3) > 5£z£ZOp£!2. We want to maximize the ratio r(£, -^p^ : f3)/k : which gives us 
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the maximum number of updates before rebuilding L(v) per colour stored in v. If h(k) = ; then the 

derivative h'(k) = ( ~ 2 ( ^?'(- 2 f+^ 1)< » which has zeros at fc = j 1 ^ 4 / 1 ^ , iz^IE2}. The relevant zero, 
which maximizes h(k) is k = hzM+^IEE _ Substituting this in as fe into ; we g e t that 

updates are required before a colour c ^ £(i>) can become a /3-majority for the range spanned by node v. □ 

By LemmaJSJ our lazy updating scheme only requires each list L(v) to have size [ 1 ~^ + /3 v/1 — ] = 0(l/a). 
This leads to the following theorem: 

Theorem 1. Given a set V of n points in one dimension and a fixed a € (0, 1), there is an 0(n) space data 
structure that supports range a-majority queries on V in 0((lgn)/a) time, and insertions and deletions in 
0((\gn)/a) amortized time. 

Proof. Query time follows from Lemma [6] In order to get the desired space, we combine Lemmas [6] and 
[8j implying that each list L(v) contains 0(1/ a) colours. This allows us to use the same pruning technique 
described in Lemma [5] in order to reduce the space to 0(n). 

When an update occurs, we follow the path from the root of T to the updated node u. Suppose, without 
loss of generality, the update is an insertion of a point of colour c. For each vertex v on the path, if v contains 
a list L(v), we check whether c is in L(v). If it is, then we increment the count of colour c. This takes 0(l/a) 
time. We also increment the counter for v that keeps track of the number of updates into T(v) that have 
occurred since L(v) was rebuilt. Thus, modifying the lists and counters along the path requires 0((lgn)/a) 
time in the worst case. 

We next look at the costs of maintaining the lists L(v). The list L(v) can be rebuilt in 0(\T(v)\) time, 
using the array D. Note that D can be maintained under updates using the same scheme described in 
Section [L~2l First, we use D to compute the frequency of all the colours in R(v) in 0(\T(v)\) time. Let k 
be the value from Lemma [5] Since there are at most 0(\T(v)\) colours, we can use a linear time selection 
algorithm to find the fe-th most frequent colour in D, and then find the top k most frequent colours via a 
linear scan in 0(|T(u)|) time. We can then enforce the necessary semi-ordering on this list in O(k) = O(^) 
time, as described in Section r2.ll Thus, each leaf in T(v) pays 0(1) cost every 0(\T(v)\a) insertions, or 
0(1/ a) amortized cost per insertion. Since each update may cause O(lgn) lists to be rebuilt, this increases 
the cost to 0((lgn)/a) amortized time per update. 

We make use of standard local rebuilding techniques to keep the tree T balanced, rebuilding the lists in 
nodes that are merged or split during an update. Since a node v will only be merged or split after 0(|T(i>)|) 
updates by the properties of weight-balanced B-trces, these local rebuilding operations require 0(lgn/a) 
amortized time. Finally, we can update F c and F during an insertion or deletion of a point of colour c in 
O(lgn) time. Thus, updates require 0((\gn)/a) amortized time overall, and are dominated by the costs of 
maintaining the lists L(v) in each node v. □ 



3 Speedup for Integer Coordinates 

We next describe how to improve the query time of the data structure from Theorem [1] from 0((lgn)/a) to 
0(lg n/(a lg lg n)) for the case in which the ^-coordinates of the points in V are integers that can be stored 
in a constant number of words. 

To accomplish this goal, we require an improved one-dimensional range counting data structure, which 
we get by combining two existing data structures. The fusion tree of Fredman and Willard [15j is an 0(n) 
space data structure that supports predecessor and successor queries in (9(lgn/lglgn) time and inser- 
tions/deletions in 0(lgn/lglgn) amortized time. The list indexing data structure of Dietz [IT] uses 0(n) 
space and supports rank queries — i.e, given an element, return the number of elements that precede it in 
the list — in 0(lgn/lglgn) time, and insertions/deletions in 0(lg n/ lglgn.) amortized time. In Andersson et 
al. PP, it was observed that these data structures could be combined to support dynamic one-dimensional 
range counting queries in 0(\gn/ lglgn) time per operation; amortized for updates. We refer to this data 
structure as an augmented fusion tree. 
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In order to achieve 0(lgn/(alglgn)) query time, we implement all the range counting data structures as 
augmented fusion trees: i.e., the data structures F, and F c for each c £ C. Immediately, we get that we can 
perform a query in 0(lg n/(a lg lg n) + lg n) time: 0(lg nj (a lg lg n)) time for the range counting queries, and 
O(lgn) time to find the nodes in I t . We now discuss how to remove the additive O(lgn) term, which involves 
modifying our weight-balanced B-tree to support dynamic lowest common ancestor queries. To identify the 
top 0(lg — ) levels of /, we use the following lemma: 

Lemma 9. The weight-balanced B-tree T can be augmented in order to support lowest common ancestor 
queries in 0(y/\gn) time without changing the 0((lgn/a)) amortized time required for updates. 

Proof. Let the first ancestor of a node u be the parent of it, and the £-th ancestor of u be the parent of the 
(£ — l)-th ancestor of u for £ > 1. In order to support lowest common ancestor queries between two nodes 
z a and Zb, denoted LCA(z Q ,Zb), we add three pointers to each node u € T: pointers to both the leaves 
representing the minimum and maximum x-coordinates in T(u), and a pointer to the £-th ancestor of u; we 
will fix the value of £ later. We can search for the LCA(z a , Zb) by setting v — z a and following the pointer 
to the ^-th ancestor of v, denoted v' . By checking the maximum ^-coordinate to see if R(v') contains zj, we 
can determine whether v' is an ancestor of LCA(z a , Zb) or a descendant of LCA(z a , Zb) in constant time. If 
v' is a descendant of LCA(z a , Zb), then we set v to v' and v' to the ^-th ancestor of v' . If v' is an ancestor 
of LCA(z a , Zb), then we backtrack and walk up the path from v to v' until we find LCA(z a ,z;,). Overall, 
it takes O(h /£ + £) time to find node z = LCA(z a , Zb), if z is at height ho in T. By setting i — 0{\/\gn) 
we get 0(\/\gn) time. Furthermore, the pointers we added to T can be updated in O(lgn) amortized time 
during an insertion or deletion. Whenever we merge or split a node u, we have 0(]T(u)\) time to fix all of 
the pointers into u, by properties of weight-balanced B-trees. The pointers out of u can be fixed in O(lgn) 
worst case time. □ 

Although Lemma IH1 is weaker than other results (cf. it is simple and sufficient for our needs. We 

next present the following theorem: 

Theorem 2. Given a set V of n points in one dimension with integer coordinates and a fixed a € (0, 1), 
there is an O(n) space data structure that supports range a-majority queries on V in 0(lgn/(alglgn)) time, 
and both insertions and deletions into V in 0((\gn)/a)) amortized time. 

Proof. Suppose we are given a query range [x a , Xb\. Applying Lemma [H] to the weight-balanced B-tree T, we 
claim that we can identify the top £ levels of I- that are not necessarily from consecutive levels in T- using 
0(£) least common ancestor operations. To show this, we describe a recursive procedure FiNDTOP(z a , Zb, £) 
for identifying the top I levels of /. We assume that we have acquired pointers to z a and Zb, the leaves of 
T that represent the ^-coordinates of the successor of x a and predecessor of Xb, respectively. To do this, we 
add a pointer from each leaf in the augmented fusion tree F to its corresponding leaf in T . Given a query, we 
initially perform a successor query for x a and predecessor query for Xb in F, and follow these extra pointers 
to z a and Zb, respectively. We assume that z a ^ Zb, otherwise the query is trivially answered by reporting 
the colour stored in z a . 

Let z = LCA(z a , Zfc), and Cj denote the i-th child of z. Let zi and z r denote the leftmost and rightmost 
leaves in T z . In constant time we can determine children Cj and Ck of z which are on the path to z a and Zb, 
respectively. Note that k — j > 0, otherwise z is not the LCA(z a , Zb). We say we are in the good case when 
z a = zi, Zb = z r , and/or fc — j > 1. When we are in the good case, either cj, c^, and/or cy+i, Ck-i are in 
the top level of /, and we set £' = £ — 1. Otherwise, if fc — j = 1 and z a ^ zi and Zb ^ z r , then we are in 
the bad case. In the bad case we have not found the top level of /, and we set £' = £. In both cases (good 
or bad) , let Zy be the leaf in representing the minimum ^-coordinate in T Ck , and z a i be the leaf in cj 
representing the maximum x-coordinate in T Cj . We recurse if £' > 0, calling FlNDTOP(z a , z a >, £') if z a ^ z a / 
and FmDTOP(z b ',Zb,£') if z b ^ z v ■ 

We observe that the procedure FiNDTOP(z a , Zb, £) uses 0{£) least common ancestor queries. This is 
because if a call to Findtop is in the bad case, then the subsequent recursive call(s) will be in the good case 
by choice of z a > and Zb>, and only the initial call to Findtop can make two recursive calls. Using Findtop, 
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we can identify the top 0(lg — ) levels of / in 0{^/\gn x lg ■=•) time, replacing the 0(\gn) additive term. This 
factor is strictly asymptotically less than the time required to perform the range-counting queries, which is 
0(lgn/(alglgn)). 

By Lemma [U the we can support the lowest common ancestor operation without increasing the update 
time of T as stated in Theorem [TJ The extra pointers we added from the leaves of F to the leaves of T can 
also be updated without affecting the bound from Theorem [1] since during any insertion/ deletion of a point 
p, the two leaves corresponding to p in both F and T must be located. Therefore, the total update time 
follows from Theorem Q] □ 



4 Extension to Higher Dimensional Point Sets 

In this section we present a refinement of the technique presented by Karpinski and Nekrich [53], who used 
standard range tree techniques [4] to generalize their range a-majority structures to higher dimensions^ 
We note that, recently, Wilkinson [31] has used the same refinement to improve the bounds of Durocher et 
al. [13] for the two-dimensional static case. 

All of the algorithms presented thus far have the following two phase structure. The first is the candidate 
extraction phase, in which we extract a list of candidates from our data structure. The second phase is the 
verification phase, in which we use range counting data structures to verify that they are actual a-majorities. 
For higher dimensional problems we speed up the verification phase by adding an additional filtering phase 
between the candidate extraction and verification phases. 

In order to do this, we make use of approximate range counting data structures [26|28|31j. If m points 
are contained in the query range, then an approximate range counting data structure with additive error m! 
will return a count in the range [m — m',m + m']; see [28]. Similarly, a data structure with multiplicative 
error (1 — e) will return a count in the range [(1 — e)m,m]; see |26| . In the remainder of this section we 
first modify existing data structures for approximate range counting, and then consider their applications to 
higher dimensional data structures for dynamic range a-majority queries. 

4.1 Approximate Range Counting 

Before stating the results for higher dimensional range a-majority data structures we require some addi- 
tional results on approximate range counting. We begin with a lemma, which is a very minor generalization 
of Nckrich's one-dimensional approximate range counting data structure [28] Theorem 1]. In the original 
structure each point is unweighted, but we wish to add the operations increment and decrement to the 
structure, which respectively increase and decrease the weight of a point by one. We assume a newly inserted 
point begins with weight one. Instead of returning the number of points in a query range (within an additive 
error term), our query operation will return the sum of the weights of the points in the query range, within 
an additive error term. 

Lemma 10. Let r > 1 be an integer constant. dpred(n) denote the cost of a dynamic predecessor search on 
n keys, and m denote the sum of the weight of the points contained the query range. There exists an 0(n) 
space data structure that supports approximate weighted range counting queries with additive error m Y l r in 
0(dpred(n) + lglgn) time, deletions in O(lglgn) amortized time, and insertions in 0(dpred(n) + lglgn) 
amortized time. The operations increment and decrement are supported in O(lglgn) amortized time. 

Proof. Let m! be the approximate weight returned by our data structure, while m is the exact weight. We 
divide the solution into two cases. In the first case, we assume that m > /io(lg n) T for some arbitrary constant 
ho > 0. We emphasize that in both cases the data structure and proof are essentially the same as Theorem 
1 of Nekrich [28], with some minor modifications. We use an exponential search tree T [2], where each leaf 
in T represents a point, but also stores the weight associated with the point. We require some additional 

5 In the preliminary version of this paper [14], the paragraph preceding Theorem 3 erroneously stated that the result 
which we prove in this section follows immediately from the analysis of Karpinski and Nekrich [23] . 



10 



notation, and closely follow that of Nekrich [35]. Let Vi denote the i-th child of v, n v denote the number of 
leaves in the subtree T(v), W(v) denote the weight of the leaves of the subtree T(v), and f(v) denote the 
number of children of v. In the exponential search tree T, each node v has 0(n^ T ) children, each of which 
contains between ni T_1 ^ r /2 and 2n« T— points, for a fixed constant r > 2. Each node v stores its weight 
W(v), as well as a set of approximate weights W'(v, such that 

W( Vl ) + ... + W(vj) - n 3 J T /2 < W'(v,i,j) < W{vi) + ... + W(vj) + n 3 J T /2 , 

for all 1 < i < j < f{v). We recompute all counts W'(v,i,j) after nj r jl update operations (insertions, 
deletions, increments, and decrements). Recomputing all the W'(v, i,j)'s for a node takes 0{nJ T ) time. 
Thus, each update operation — insertion, deletion, increment, decrement — requires O(lglgn) amortized 
time, since the height of T is O(lglgn) and we must increment or decrement the weight W(v) stored in each 
node on the path from leaf to root. 

The space is linear by the properties of exponential search trees [2] , and all that remains is to argue the 
correctness of the query algorithm of Nekrich [28] . The query algorithm essentially finds the ranges in T that 
represent the query range, and returns the summation of the approximate counters of those ranges. 

For a fixed node v from the set of nodes representing the query range, with children Uj, Vj contained 
entirely in the query range, let m' v = W'(v, i, j) and m v = J2i=i W(ve). Then m v — n 3 J T < m' v < m v + n 3 J r . 
Since m v > ni T 1 ^ T 1 m„— ^ < m' v < m v +rriy^ T x \ Since T has height hi lglgn for some constant hi, 
m— (2hi lglgn)(TO 3 ^ r_1 '- ) ) < m! < m+ (2h\ lglgn) (m 3 ^ T ~ 1 ^). However, since we assume m > ho(\gn) T , we 
need only ensure ho > (2/h) t in order for 2h\ lglgn < m 1 /^^ 1 ). Thus, m — to 4 ^ 1 " -1 ^ < m! < m + m i '^ T ~ x ' . 
By replacing r with r' = max(5r, 5) we obtain the result of the lemma. 

In the second case, when m < h (lgn) T , we make use of an alternative data structure. We divide the point 
set into groups of between h Q (lgn) T and 4h Q (\gn) T consecutive points, and store each group in a balanced 
binary search tree. Each node u in the search tree stores the total weight of the nodes in the subtree induced 
by u. Given the successor and predecessor, e' x and e' 2 , of a query range [ej., e%\, we can assume that points e' : 
and e' 2 cither belong to the same group, or two adjacent groups. Thus, given e[ and e' 2 , which can be obtained 
in 0(dpred(n)) time, we can tally the exact weight in this case in O(lglgn) time. Using standard techniques 
we can support insertion in 0(dpred(n) + lglgn) amortized time; 0(dpred(n)) to find the position in which 
to insert the new element, and O(lglgn) amortized time to insert it into the binary search tree for its group, 
accounting for merging/splitting of groups. By analogous arguments deletion takes O(lglgn) amortized time. 
Finally, increment and decrement can be performed in O(lglgn) worst case time. □ 

Before continuing, we require the definition of a generalized union-split-find (GUSF) data structure, as 
well as the time bounds for its operations. 

Lemma 11 ([19j, Theorem 5.2). A GUSF stores an ordered list G of elements, in which each element x 
of G is associated with a subset U(x) C {1, ...,lg 3 n} of colours. Assume we have a pointer to an element 
x £ G, and U' C {1, ...,lg 3 n} be a set of colours. A GUSF supports the operations: 

— find(x, U'): return the successor of x with colour c G U' . 

— add(y , x) : inserts y into the list before x. 

— erase(x): removes x from the list, assuming U{x) = 0. 

— mark(x,c): inserts c into U(x). 

— unmark(x, c) removes c from U(x). 

A GUSF can be implemented in 0(n) space, such that each operation takes O(lglgn) time. The time bound 
for add and erase is amortized, while the running time of all other operations are worst case. A GUSF 
containing n elements can be constructed in 0(n lg 1 n lglgn) time. 

Next we are ready to state and prove the main result of this section: 
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Lemma 12. Let V be a set of d- dimensional points for any d > 2. The point set, V, can be preprocessed 
into an 0((n lg^" 1 n)/ lglg n) space data structure, such that for any arbitrary d- dimensional axis-aligned 
hyperrectangle, Q, approximate range counting queries can be performed in C^lg^" 1 n) time, with additive 

error \P f~l Q|~ , for any constant integer j > 1. Insertions and deletions can be performed in 0(lg d ~ 1+4 n) 
amortized time. 

Proof. The proof of this lemma is almost the same as Theorem 2 in [28] , except that we increase the cost 
of the query by a factor of O(lglgn) for any d > 3, and decrease the space bound by a factor of 0(lg e n). 
We make use of dynamic fractional cascading in weight balanced B-trees without modifications [19], and a 
slight modification of the GUSF of Lemma HT1 

We next describe how to combine the one-dimensional weighted approximate counting data structure 
from Lemma [TOl with the GUSF. This will allow the GUSF to support coloured approximate range counting: 
i.e., given a colour c G {l,...,lg*n} and a range [x a ,Xb], approximately report the number of elements 
with colour c contained in [:r ,a;&|. A single modified GUSF will then be stored in each internal node of a 
weight-balanced B-tree, in order to support two-dimensional approximate range counting. 

A GUSF groups consecutive elements into blocks which are of size (9(lg 2+4 n). The elements in each block 

are stored in a balanced binary search tree. For each node in the tree, we store the counts of the number 

i 

of children with each of the lg 4 n colours, with counter n c storing the number of points of colour c. Since 
the tree has 0(lg 2+4 n) elements, each counter requires O(lglgn) bits, and thus the counters for a node can 
be packed into a constant number of words. Thus, these counters do not increase the space of the GUSF 
structure asymptotically. 

As in a standard GUSF, each block in the modified GUSF is represented in an order maintenance structure 
that maps a block to an integer coordinate. Given two blocks, b and b' , we denote their corresponding integer 
coordinates X(b) and X(b'), and we can determine whether the elements in b precede those in 6', or vice 
versa, by comparing these coordinates; see [TH] for full details. 

Our modified GUSF also stores 0(lg 4 n) copies of the data structure from Lemma [TOl one for each colour 
c € [1, lg 3 n], denoted D c , We discuss how to set the parameter r for each D c later. For each block b that has 
a counter value n c > in its root for colour c, we store a point representing that block in D c , with weight 
n c , and coordinate X{b). The root of b also stores a pointer to the leaf in D c representing these points, for 

i- 94- 1 

each colour c G [1, lg 4 n}. Since there are at most Oinj lg 4 n) blocks, all these structures together occupy 
0(nj lg 2 n) space. 

Given two elements e\ and e%, where e\ < e2 and both elements are marked with colour c, we can 
determine the approximate number of points with colour c that both succeed e\ and precede ei, as follows. 
First, if ei and e2 are in the same block, we can return the exact count in O(lglgn) time using the counters 
that are stored in the nodes of the balanced binary tree representing the block. Otherwise, we need to perform 
an additional step of querying the data structure D c , providing pointers to the leaves in D c that represent 
the blocks containing e% and e%, respectively. 

With the exception of the data structures D c , the GUSF containing n elements can be constructed in 
0(n lg 5 nlglgn) time by Lemma ITT] since each GUSF operation takes at most O(lglgn) amortized time. 
Since each point results in an insertion or increment operation on 0(lg 5 n) approximate range counting data 

94- 1 2. 

structures, each of size (9(n/lg 4 n), this takes 0(lg 8 nlglgn) amortized time per point, and does not 
asymptotically change the construction time. 

We are next ready to discuss our data structure for planar approximate counting, i.e., the case in which 
d = 2. We store a weight balanced B-tree T over the y-coordinates of the given points, with branching 
parameter (9(lg 5 n) and leaf parameter 1. For each internal node u of T with degree /, we store our modified 
GUSF M(u), over all of the points in the subtree T(u), ordered by their x-coordinate. Note that there are 
(9(lg 4 n) possible contiguous subranges of children of u in total, and each child of u belongs to <9(lg F n) 
of these ranges [i,j], where 1 < i < j < f. We construct a set of colours, and each colour corresponds 
to a possible range Thus, each point in M(u) is marked with the <9(lg s n) colours corresponding to 

these ranges. Each node u also stores a catalogue V(u) corresponding to the points in T(u) ordered by 
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x-coordinate. Each catalogue element stores a pointer to the corresponding element in M(u). We maintain 
a dynamic fractional cascading data structure over the catalogues of T. 

Since the branching parameter of the tree T is (9(lg 5 n), the tree has height <9(lgn/ Iglgn). Each point 
is stored in 0(\gn/ lglgn) nodes, each containing a constant number of linear space structures. Thus, the 
space occupied by the data structure is <9(nlgn/ lglgn). 

To answer a query of the form [xi, X2] x [2/1,2/2], we perform a search for the successor and predecessor, 
e\ and e2, of the query range [xi,X2] in each catalogue of each node among the nodes representing [2/1,2/2] 
in T. This takes 0(\gn) time, since there are 0(\gn/ lglgn) catalogues: the initial search requires 0(\gn) 
time, and each additional search uses (9(lglgn) time. For a fixed internal node u of T, such that the query 
range [2/1,2/2] spans children [i,j], let c be the colour representing [i,j] in M{u). We locate e[ and e' 2 , the 
respective successor and predecessor of e\ and e2 in M(u) with colour c, using the find operation. Thus, 
locating e'x and e 2 in M(u) takes O(lglgn) time. Once we have located e[ and e' 2 we can query D c , and the 
counters in the block(s) containing e[ and e' 2 , in O(lglgn) time, as outlined above. Thus, the overall query 
time is 0(lglgn(lgn/ lglgn) + lgn) which is O(lgn). 

Suppose we desire additive error \V n Q\ 1 for some fixed constant j > 1. Then, we set the parameter 
r = 2j. Let to' denote the sum of each of the h 2 {\gnj lglgn) approximate counts tallied at each node that 
represents the query range, where h 2 is a constant that depends on the height of T, and m denotes the exact 
count. Thus, 

m — h 2 (\gn / \g\g n)m~ < m! < m + /i2(lgn/lglgn)m~ . (1) 

If to > (/12 lg n) T , then m~ > /i2(lg n/ lglg n). Thus, to — m~ < to' < to + m~ . Since t — 2j, we are left 
with to — to? < to' < to + toJ , which is the desired error term. 

Next suppose to < Q12 lg n) T . In this case, we can retrieve the exact count in 0(lg n) time using the binary 
tree representation of D c , since none of the ranges represented can contain more than to points. Thus, in 
both cases we have shown the query algorithm is correct. Note that we must ensure ho > h 2 , in addition to 
the constraints on ho described in Lemma [POl 

In order to insert a point p, we identify the nodes on the path Y from the root of T to the leaf where 
p will be inserted. We then search for the successors of p in all of the catalogues on this path, which takes 
O(lgn) time in total. Once we have the successor, we can insert p into each GUSF along Y in the following 
way. Let u be a node in Y and be the child of u whose range contains p. Using the pointer to the successor 
of p in M(u), we can perform an add operation, inserting p into a block b in M(u). Let U' denote the set of 
colours in M(u) representing the ranges that contain Ui. 

If b splits into two blocks b and b' as a result of this, then we must decrement the weight of the element 
representing b in each D c for each c with a non zero counter in the root of b. We also must insert a new 
element representing b' into each D c for each c with a non zero counter in the root of &', and increment 
its weight accordingly. Recall that 0(\g 2+J n) elements must have been inserted into M(u) to cause b to 
split. Each split causes 0(lg s n(lg 4 n)) update operations on all D c , for each c stored in the roots of b 
and b' . This is 0(lg 5 n lglgn) amortized update time. In the case in which b does not split, we still require 
0(lg s n lglgn) amortized time to increase the weight of the representative of b in each D c , for c € U'. 
Since there are 0(lgn/ lglgn) nodes in Y, the overall insertion time is thus (9(lg 1+ ^ n), provided we do not 
cause a node to split or two nodes to merge in the base tree T. Deletion is handled analogously, except that 
we decrease the weight of the representative of b in each D c for c € U'. Thus, deletion requires 0(lg 1+F n) 
amortized time as well. 

In the case in which a node u G T splits or merges, we can efficiently update the fractional cascading 
data structure using the techniques described in [19] . The cost of a split or merge is dominated by the cost of 
rebuilding the modified GUSF structures in both u and u's parent. We can rebuild each modified GUSF in a 
node representing m points in 0(mlg 5 n lglgn) time. Since 0(m) updates are required to split a node with 
a parent containing 0(TO,lg s n) points, and 0(lgn/ lglgn) splits/merges can occur during a single update, 
the cost of performing an update is 0(lg 1+3 n) amortized time. 
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To get the bound stated by the lemma for higher dimensions, we use the standard range tree technique [2], 
which inflates the space, query and update time by a factor of 0(lg n) for each additional dimension. In 
general, we must set the parameter r = 2 d ~ 1 j. □ 

4.2 Range a-Majority in Higher Dimensions 

As an application of Theorem [T] and the approximate range counting data structures of Lemma [T2l we can 
improve the query time for range a-majority data structures in higher dimensions. 

Theorem 3. Given a set V of n points in d dimensions (for a constant d>2) and a fixed a € (0, 1), there 
is an 0(nlg d ~ 1 n) space data structure that supports range a-majority queries on V in 0((lg d n)/a) time, 
and insertions and deletions into V in 0({Vg d n)/a) amortized time. 

Proof. Using range trees, we can convert any d-dimcnsional range a-majority query into a combination of 
several (d — l)-dimcnsional range a-majority queries and d-dimensional range counting queries. In particular, 
let S(n, d) denote the cost of a d-dimensional range counting query on a dynamic set of n points, and A4(n, d) 
denote the cost of a <i-dimensional range a-majority on a dynamic set of n points. Then, for d > 2, 

M(n,d) = 0(lgn)M(n,d-l)+0(\gn/a)S(n,d) , (2) 

since we can extract and verify the frequency of the 0((lgn)/a) candidates from the O(lgn) nodes represent- 
ing the range spanned by the d-th coordinate of the query range. Note that each candidate is an a-majority 
if we consider their first (d — 1) coordinates only. Since d-dimensional dynamic orthogonal range counting 
queries require 0(lg d n) time [5], M(n,d) — 0((\g d+1 n) / a). 

To further reduce the query time we observe that only 0(l/a) of the 0(lgn/a) candidates can have 
frequency above (1 — e)aq, where q is the total number of points contained in the query range and e G (0, 1) 
is an arbitrary constant. Thus, we add additional data structures F' c for each c <E C, where each F' c is the 
structure of Lemma 1 121 and stores the points of colour Using these data structures, we can perform an 
additional filtering pass of the list of 0(lgn/a) candidates into a shorter list of 0(l/a) candidates. After this 
filtering step we can then verify the frequency of the 0(1/ a) candidates above this threshold exactly using 
range counting data structures F c . Let S(n, d) denote the query time of the data structure from Lemma [P2l 
We can rewrite the recurrence of Equation [2] as: 

M(n,d) = 0(\gn)M(n,d-l) + 0(lgn/a)S(n,d) + 0(l/a)S(n,d) . (3) 

This recurrence resolves to M(n, d) = 0((lg d n)/a). We can update the structures F c and F' c for the colour, 
c, of the inserted or deleted point, and F in 0(\g d n + lg d ~ 1+1 n) amortized time. Each of the O(lgn) (d— 1)- 
dimensional range a-majority data structures can be updated in 0((lg rf ~ 1 )n/a) amortized time, for a total 
of 0((\g d n)/a) amortized time. Finally, the space cost is dominated by the range a-majority structures. The 
space occupied by this can be expressed &sLi(n,d) — 0(lgn)U(n,d~ 1), where U(n,d) is the space occupied 
by a d-dimensional dynamic range a-majority structure, and d > 2. Thus U(n, d) = 0(n\g d ~ 1 n). □ 

5 Dynamic Arrays 

In this section we extend our results to dynamic arrays. In the dynamic array version of the problem, we 
wish to support the following operations on an array A of length n, where each A[i] contains a colour, for 
1 < i < n: 

— Insert(i,c): Insert the colour c between the colours A[i — 1] and A[i]. This shifts the colours in positions 
i to n to positions i + 1 to n + 1, respectively. 

6 The data structure of Lemma[l2]has very small additive error, though, for the purposes of this proof, we need only 
constant multiplicative error. 
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— Delete(z): Delete the colour A[i]. This shifts the colours in positions i + 1 to n to positions i to n— 1, 
respectively. 

— MoDiFY(i,c): Set the colour to c. 

— QuERY(i, j): Let |A[i..j]| c denote the number of occurrences of colour c in the range Report the 
set of colours M such that for each c € M, \A[i..j]\ c > a\j — i + 1|. As before, we refer to a colour c G M 
as an a-majority in the range ._?'], and the query as a range a-majority query. 

The dynamic array problem boils down to the well-studied problem of maintaining an injective order 
preserving mapping from the positions in A into a larger set of integer keys V [37] ■ We next prove the 
following theorem: 

Theorem 4. Given an array A[l..n] of colours and a fixed a € (0, 1), there is an 0(n) space data structure 
that supports range a-majority queries on A in 0((lg n)/(a lg lg n)) time, INSERT in 0((lg 3 n)/(a lg lg n)) 
amortized time, Delete in 0{(\gn) / a) amortized time, and Modify in 0((\gn) / a) amortized time. 

Proof. We maintain our data structure T from Theorem [5] on the integer key set V . Each time a key p in 
V is changed to key p' , we must delete p from T, and then insert p' into T. If an insertion or deletion into 
our dynamic array changes £ keys in the mapping, it will require 0((£\gn)/a) amortized time to change 
these values in T. We note that a Modify operation corresponds to one deletion and one insertion into T, 
requiring 0((lg n)/a) time. 

We apply the dynamic reduction to extended rank space technique [27], which maps the positions in A 
to integers in the bounded universe [1..0(n 3 )]. This mapping requires 0((lg 2 n)/ lg lg n) amortized time for 
insertions, and 0(1) amortized time for deletions. These time bounds also bound the number of key changes 
for insertion and deletion (in the amortized sense), completing the proof. □ 

6 Conclusions 

We have presented several new dynamic data structures for the range a-majority problem. These data 
structures improve on the previous results in terms of space, query, and update time. 

Notably, for one-dimensional points, we presented a linear space data structure with 0(\gn/a) query 
time, and 0(lgn/cv) amortized update time. In the case in which the coordinates of the points are integers, 
we reduced the query time by a (lglgn)-factor. This improved query time matches an existing cell-probe 
lower bound, for the case when 1/a is a constant, and the word size is <9(lgn). 

We also extended our one-dimensional data structure to d-dimensions, where d > 2 is an arbitrary 
constant. The generalized structure occupies 0{n\g ~ n) space, has 0{\g d n/a) query time, and supports 
updates in 0(\g d n/a) amortized time. It would be interested to determine if either the space or query time 
can be improved for the higher dimensional data structure. 
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